arXiv:l504.00648vl [math.OC] 2 Apr 2015 


NONSMOOTH TRUST-REGION ALGORITHM WITH APPLICATIONS 
TO ROBUST STABILITY OF UNCERTAIN SYSTEMS 


Pierre Apkarian^, Dominikus Noll* *, Laleh Ravanbod* 


Abstract. We propose a bundle trust-region algorithm to minimize locally Lipschitz 
functions which are potentially nonsmooth and nonconvex. We prove global convergence 
of our method and show by way of an example that the classical convergence argument 
in trust-region methods based on the Cauchy point fails in the nonsmooth setting. Our 
method is tested experimentally on three problems in automatic control. 

Keywords. Bundle • cutting plane • trust-region • Cauchy point • global convergence • 
parametric robustness • distance to instability • worst-case 7?oo-norm 


1. Introduction 

We consider optimization problems of the form 

. ^ minimize f{x) 

^ ' snbject to x ^ C 

where / : M"’ —>■ M is locally Lipschitz, but possibly nonsmooth and nonconvex, and where 
G is a simply structured closed convex constraint set. We develop a bundle trust-region 
algorithm for Q, which uses nonconvex cutting planes in tandem with a suitable trust- 
region management to assure global convergence. The trust-region management is to be 
considered as an alternative to proximity control, which is the usual policy in bundle 
methods. Trust-regions allow a tighter control on the step-size, and give a larger choice 
of norms, whereas bundling is fused on the use of the Euclidean norm. Our experimental 
part demonstrates how these features may be exploited algorithmically. 

Algorithms where bundle and trust-region elements are combined are rather sparse 
in the literature. For convex objectives Ruszcyhski |2B] presents a bundle trust-region 
method, which can be extended to composite convex functions. An early contribution 
where bundling and trust-regions are combined is nans], and this is also used in versions 
of the BT-code j36]. Fuduli et al. jT9] use DC-functions to form a non-standard trust- 
region, which they also use in tandem with cutting planes. A feature which these methods 
share with nonconvex bundle methods like Sagastizabel and Hare [39l HO] or |33] is that 
the objective is approximated by a simply structured, often polyhedral, working model, 
which is updated iteratively by adding cutting planes at unsuccessful trial steps. Our main 
Theoremanalyses the interaction of this mechanism with the trust-region management, 
and assures global convergence under realistic hypotheses. 

The trust-region strategy is well-understood in smooth optimization, where global con¬ 
vergence is proved by exploiting properties of the Cauchy point, as pioneered in Powell 
|85) . For the present work it is therefore of the essence to realize that the Cauchy point 
fails in the nonsmooth setting. This happens even for polyhedral convex functions, the 
simplest possible case, as we demonstrate by way of a counterexample. This explains why 
the convergence proof has to be organized along different lines. 
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The question is then whether there are more restrictive classes of nonsmooth func¬ 
tions, where the Cauchy point can be salvaged. In response we show that the classical 
trust-region strategy with Cauchy point is still valid for upper C^-functions, and at least 
partially, for functions having a strict standard model. It turns out that several prob¬ 
lems in control and in contact mechanics are in this class, which justifies the disquisition. 
Nonetheless, the class of functions where the Cauchy point works remains exceptional 
in the nonsmooth framework, which is corroborated by the fact that it does not include 
nonsmooth convex functions. 

A strong incentive for the present work comes indeed from applications in automatic 
control. In the experimental part we will apply our novel bundle trust-region method to 
compute locally optimal solutions to three NP-hard problems in the theory of systems with 
uncertain parameters. This includes (i) computing the worst-case Hoo-noim of a system 
over a given uncertain parameter range, (ii) checking robust stability of an uncertain 
system over a given parameter range, and (iii) computing the distance to instability of a 
nominally stable system with uncertain parameters. In these applications the versatility 
of the bundle trust-region approach with regard to the choice of the norm is exploited. 

Nonsmooth trust-region methods which do not include the possibility of bundling are 
more common, see for instance Dennis et al. im, where the authors present an axiomatic 
approach, and [T51 Chap. 11 ], where that idea is further expanded. A recent trust-region 
method for DC-functions is |26) . 

The structure of the paper is as follows. The algorithm is developed in section and 
its global convergence is proved in section Applications of the model approach are 
discussed in section where we also discuss failure of the Cauchy point. Numerical 
experiments with three problems in automatic control are presented in section 

Notation 

For nonsmooth optimization we follow na. The Clarke directional derivative of / is 
/°(x, d), its Clarke subdifferential df{x). For a function cj) of two variables di(j) denotes the 
Clarke sub differential with respect to the first variable. For symmetric matrices M ^ 0 
means negative semidefinite. For linear system theory see @51. 

2. Presentation of the algorithm 

In this chapter we derive our trust-region algorithm to solve program (|^ and discuss 
its building blocks. 

2.1. Working model. We start by explaining how a local approximation of / in the 
neighborhood of the current serious iterate x, called the working model of /, is generated 
iteratively. We recall the notion of a first-order model of / introduced in |33) . 

Definition 1. A function 0 : M"' x M” —)■ M is called a first-order model of / on a set Q 
if 4>{-,x) is convex for every x E and the following properties are satisfied: 

(Ml) (f){x,x) = /(x), and di(j){x,x) C df{x). 

(M 2 ) If Hk —t X, then there exist —?• O’*" such that f{yk) < fi{yk,x) + ek\\yk — x||. 

(M 3 ) If Xfe -)■ x,yk-^ y, then limsup^^^ fi{yk,Xk) < (p{y,x). □ 

We may think of 4>{-,x) as a non-smooth first-order Taylor expansion of / at x. Every 
locally Lipschitz function has indeed a first-order model 0**, which we call the standard 
model, defined as 

0 ^( 2 /, x) = /(x) -F /°(x, y - x). 
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Here f°{x,d) is the Clarke directional derivative of / at x in direction d. Following |33| . 
a first-order model 4>{-,x) is called strict at x G i? if the following strict version of (M 2 ) 
is satished; 

(M 2 ) Whenever yk —)■ x, x^ —)■ x, there exist —)■ O’*" such that f{yk) < (l>{yk,Xk) + 

Cfc II l/fc Tfc II. 

Remark 1. Axiom (M 2 ) corresponds to the one-sided Taylor type estimate f{y) < 
(j){y,x) -l-o(|||/ — x||) as 1/ —)■ X. In contrast, axiom (M 2 ) means f{y) < (l>{y,x) -|-o(|||/ — x||) 
as III/ — XII —0 uniformly on bounded sets. This is analogous to the difference between 
differentiability and strict differentiability, hence the nomenclature of a strict model. 

Remark 2. Note that the standard model (j)^ of / is not always strict |3l]. A strict hrst- 
order model 0 is for instance obtained for composite functions f = ho F with h convex 
and F of class if one defines 

0 ( 1 /, x) = h (F(x) + F'{x){y - x)), 

where F'{x) is the differential of the mapping F at x. The use of a natural model of this 
form covers for instance approaches like Powell |35) . or Ruszczyhski [38], where composite 
functions are discussed. 

Observe that every convex / is its own strict model (j){y,x) = f{y) in the sense of 
dehnition As a consequence, our algorithmic framework contains the convex cutting 
plane trust-region method [SB] as a special case. 


Remark 3. It follows from the previous remark that a function / may have several 
hrst-order models. Every model 0 leads to a different algorithm for ([^. 


We continue to consider x as the current serious iterate of our algorithm to be designed, 
and we consider z, a trial point near x, which is a candidate to become the next serious 
iterate x"*". The way trial points are generated will be explained in Section 2.2 


Definition 2. Let x be the current serious iterate and z a trial step. Let ghe a subgradient 
of 0(-, x) at z, for short, g G cii0(^, x). Then the affine function m(-, x) = 0(z, x)+g~^{- — z) 
is called a cutting plane of / at serious iterate x and trial step z. □ 


We may always represent a cutting plane at serious iterate x in the form 

m(-, x) = a + g"' {■ — x), 

where a = m(x, x) = 4>{z, x) F g~^ [x — z) < /(x) and g G di(l){z, x). We say that the pair 
(a, (/) represents the cutting plane m(-,x). 

We also allow cutting planes mo(-, x) at serious iterate x with trial step z = x. We refer 
to these as exactness planes of / at serious iterate x, because mo(x,x) = /(x). Every 
{a,g) representing an exactness plane is of the form (/(x), (/o) with go G df{x). 

Remark 4. For the standard model 0** a cutting plane for trial step z at serious iterate 
X has the very specific form m**(-,x) = /(x) + gj{■ — x), where gz G df{x) attains the 
maximum /°(x, z — x) = gj{z — x). Here every cutting plane m**(-, x) is also an exactness 
plane, a fact which will no longer be true for other models. If / is strictly differentiable 
at X, then there is only one cutting plane m**(-, x) = /(x) -|- V/(x)''"(- — x), the first-order 
Taylor polynomial. 

Definition 3. Let Qk be a set of pairs (a, g) all representing cutting planes of / at trial 
steps around the serious iterate x. Suppose Qk contains at least one exactness plane at x. 
Then Qki'i x) = meiX(^a,g)egk 9~^{' ~ ^) is called a working model of / at x. □ 
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Remark 5. We index working models (j)k by the inner loop counter k to highlight that 
they are updated in the inner loop by adding tangent planes of the ideal model 0 at the 
null steps . 

Usually the (pk are rough polyhedral approximation of 0, but we do not exclude cases 
where the (pk are generated by infinite sets Qk- This is for instance the case in the spectral 


bundle method [201 HUES], see also [7] , which we discuss this in |5.3 


Remark 6. Note that even the choice (pk = (p is allowed in definition and in algorithm 
This corresponds to Q = {{a, g) : g E df{z),a = <p{z,x) + g~^{x — z)}, which is the 
largest poss ible set of cuts, or the set of all cuts obtained from (p. We discuss this case in 


section 
case is 


5.1[ If <p^ is used, then the corresponding working models are denoted (p\. Their 


analyzed in section 5.4[ 


The properties of a working model may be summarized as follows 


Proposition 1. Let (pk{-,x) be a working model of f at x built from Qk and based on the 
ideal model (p. Then 

(i) (pk{,-,x) < (p{-,x). 

(ii) (pk{x,x) = (p{x,x) = f{x). 

(hi) di(pk{x,x) C di(p{x,x) C df{x). 

(iv) If [a, g) G Qk contributes to (pk and stems from the trial step z at serious iterate 
X, then (pk{z,x) = (p{z,x). 

Proof. By construction (pk is a maximum of affine minorants of 0, which proves (i). Since 
at least one plane in Qk is of the form mo{-,x) = (p{x,x) + g~^{- — x) with g G di(p{x,x), 
we have (pi{x,x) > mo{x,x) = (p{x,x) = f{x), which proves (ii). To prove (hi), observe 
that since (pk{-,x) is convex, every g G di(pk{x,x) gives an affine minorant m{-,x) = 
(pk{x,x) + g~^{■ — x) of (pk{-,x). Then m{-,x) < 0(-,x) with equality at x. By convexity 
g G di(p{x,x), and by axiom (Mi) we have g G df{x). As for (iv), observe that every 
cutting plane m(-,x) at z satishes m{z,x) = <p{z,x), hence also <pk{z,x) = (p{z,x). □ 


2.2. Tangent program. In this section we discuss how trial steps are generated. Given 
the current working model (pk{-,x) = max{a + g~^{■ — x) : {a,g) G Qk}, and the current 
trust-region radius Rk, the tangent program is the following convex optimization problem 



minimize 

Qkiy,x) 

(2) 

subject to 

yEC 



\\y - x\\ < Rk 

where | • 

• 1 could be any norm on ML. 

Let y^ be an optimal solution of ([^. By the 


necessary optimality condition there exists a subgradient gk G d{(pk{-,x) + ic) iy^) and 
a vector Vk in the normal cone to B{x,Rk) at G B{x,Rk) such that 0 = gk + Vk, 
where ic is the indicator function of C. We call gk the aggregate subgradient at y^. This 
terminology stems from the classical bundle method, when a polyhedral working model 
is used, see Ruszczyhski [38], Kiwiel m- 

Solutions y^ of (|^ are candidates to become the next serious iterate x^. For practical 
reasons we now enlarge the set of possible candidates. Fix 0 < 0 <C 1 and M > 1, then 
every z^ E C H B{x,M\\x — y^\\) satisfying 

(3) f{x) - (pk{z'^, x)>9 (/(x) - 0fc(/, x)) 

is called a trial step. Note that y^ itself is of course a trial step, because /(x) > (pk{y^,x) 
by the definition of the tangent program. But due to 0 G (0,1), there exists an entire 
neighborhood U of y^ such that every z’^ E U nC is a trial step. 
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Remark 7. The role of here is not unlike that of the Cauchy point in classical trust- 
region methods. Suppose we use a standard working model 0^ and / is strictly differen¬ 
tiable at X. Then = f{x) + Vf{x)~^{- — x). In the unconstrained case 

C = M"’ the solution has then the explicit form y^ = x — ||v/(^)|| ’ ''^hich is indeed 

the Cauchy point as considered in |1T], see also |38l (5.108)]. Condition (|^ then takes 
the familiar form f{x) — (j)\{z^,x) > cr||V/(a;)||Rfc, see jMl (5.110)]. 


2.3. Acceptance test. In order to decide whether a trial step will become the next 
serious iterate we compute the test quotient 

/(x) - f{z^ 
f{x) - 0 fc( 2 ;^x)’ 

which compares as usual actual progress and model predicted progress. For a fixed pa¬ 
rameter 0 < 7 < 1, the decision is as follows. If pk > 7 , then the trial step z’^ is accepted 
as the new iterate x^ = z^, and we call this a serious step. On the other hand, if pk < 7, 
then z^ is rejected and referred to as a null step. In that case we compute a cutting plane 
mk{-,x) at z^, and add it to the new set Qk+i in order to improve our working model. In 
other words, a pair {ak,gk) is added, where pk G di(f){z^,x) and = (j){z'^, x) + gj {x — z^). 


( 4 ) 


Remark 8. Adding one cutting plane at the null step z’^ is mandatory, but we may at 
leisure add several other tangent planes of 4>{-,x) to further improve the working model. 
A case of practical importance, where the (pk are generated by infinite sets Qk of cuts, is 


presented in section 5.3 


Remark 9. In most applications cpk is a polyhedral convex function. If C is also poly¬ 
hedral, then it is attractive to choose a polyhedral trust-region norm ]| ■ ]|, because this 
makes (|^ a linear program. 

Remark 10. For polyhedral (pk one can limit the size of the sets Qk- Consider for 
simplicity C = M”, then the tangent program (j^ is p = min{f : Oi + gj{y — x) — t < 
0,i = 0,..., /c, ]| 2 ; — x]| < Rk}. Its dual is d = niax{^^^]^ AjOj — i?fc]| > 

0, X]i=i — !}• By Caratheodory’s theorem we can select a subset {(oq, po), ■ • •, (^n, Pn)} 
of Qk of size at most n -|- 1 with the same convex hull as Qk, so it is always possible to 
limit \Qk\ < n -|- 1. This estimate is pessimistic. An efficient but heuristic method is 
to remove from Qk a certain number of cuts which were not active at the last z^. In 
the bundle method with proximity control, Kiwiel’s aggregate subgradient |23] allows a 
rigorous theoretical limit of \Qk\ < 3, even though in practice one keeps more cuts in the 
Qk. It is not known whether Kiwiel’s argument can be extended to the trust-region case, 
and the only known bound is n -|- 1, see also [351 Ch. 7.5] for a discussion. 

2.4. Nonsmooth solver. We are now ready to present our algorithm for program ([^. 
See Algorithm next page. 


3. Convergence 

In this chapter we analyze the convergence properties of the main algorithm. 

3.1. Convergence of the inner loop. In this section we prove finiteness of the inner 
loop with counter k. Since the outer loop counter j is fixed, we simplify notation and 
write X = x^ for the current serious iterate, and x'^ = x^~^^ for the next serious iterate, 
which is the result of the inner loop. 
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Algorithm 1. Nonsmooth trust-region method 


Parameters: 0<7<7<l,0<7<r' <1,0<6'<C1, M>1. 

> Step 1 (Initialize outer loop). Choose initial iterate G C. Initialize memory 
trust-region radius as -RJ >0. Put j = 1. 

o Step 2 (Stopping test). At outer loop counter j, stop if is a critical point of 
Otherwise, goto inner loop. 

> Step 3 (Initialize inner loop). Put inner loop counter k = 1 and initialize trust- 
region radius as Ri = Build initial working model 0i(-, a:-^) based on ^i, where at 
least {f{x^),goj) E Q\ for some where E df{x^). Possibly enrich Qi by recycling 
some of the planes from the previous serious step. 

> Step 4 (Trial step generation). At inner loop counter k hnd solution of the 
tangent program 

minimize (j)k{y,x^) 
subject to y E C 

\\y - x^ll < Rk 

Then compute any trial step G Cr\B{x^, M\\x^ —y’^\\) satisfying /(a;-^)— x^) > 
o Step 5 (Acceptance test). If 

f{x^) - 


Pk — 




7, 


/(a;t) - 0fc(c^,a;J) 

put x^^^ = (serious step), quit inner loop and goto step 8. Otherwise (null step), 
continue inner loop with step 6. 

> Step 6 (Update working model). Generate a cutting plane mfc(-, a;-^) = afc-|- 5 'fc(-—x-^) 
of / at the null step at counter k belonging to the current serious step xh Add 
{O'kiQk) to Qk+i- Possibly taper out Qk+i by removing some of the older inactive 
planes in Qk- Build 4’k+i based on Qk+i- 

o Step 7 (Update trust-region radius). Compute secondary control parameter 

f{x^) — 0(t^, x^) 


Pk = 


and put 


Rk+i — 


f{xi) - 0fe(t^,xJ) 
if Pk < 7, 



if Pk > 7- 

Increase inner loop counter k and loop back to step 4. 
o Step 8 (Update memory radius). Store new memory radius 

Rk if pk < R, 

2Rk if pk ^ R- 

Increase outer loop counter j and loop back to step 2. 


RK — 
^j+1 - 


Lemma 1. Let z^ he the trial point at inner loop instant k, associated with the solution 
y^ of the tangent program, and let gk be the aggregate subgradient at y^. Then there exists 
a > 0 depending only on 6 E (0,1), M , and the norm || ■ ||, such that 


( 5 ) 


f{x) - (j)kiz^,x) > cr||5ffc||||x - 
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Proof. Let || ■ || be the norm used in the trust-region tangent program, | ■ | the standard 
Euclidian norm. Since is an optimal solution of Q, we have ^ = gk + Vk, where 
Qk G d{(f)k{-,x) + ic) (y^) and Vk a normal vector to B{x,Rk) at y’^. By the subgradient 
inequality, 

9k - 2 /^) < 0 fc(T, x) - (pkiy’', x) = f{x) - (pkiy'', x). 

Now the angle between the vector y^ — x and the normal Vk to the || ■ ||-ball B{x, Rk) at 
y^ G dB{x, Rk) is strictly less than 90°. More precisely, there exists a' G (0,1), depending 
only on the geometry of the ball B{0, 1), such that cos Z{uk,Vk) > o' for all such vectors 
Uk.Vk- But then gfix — y^) = v~f{y^ — x) > o'\vk\\y^ — x\ > cr"||nA;|||||/^ — x|| for some 
o" G (0,1) still depending only on the geometry of the norm || ■ ||. Invoking (|^ for the 
trial point ^ and using ||x — z^\\ < M\\x — j/^||, we get ^ with a = o"9M~^. □ 

Lemma 2. Suppose the inner loop at x with trial point z^ at inner loop counter k and 
solution y^ of the tangent program (|^ turns infinitely, and the trust-region radius Rk stays 
bounded away from 0. Then x is a critical point of Q . 

Proof. We have pk < 1 for all k. Since liminffc^oo-Rfc > 0, and since the trust-region 
radius is only reduced when pk > 7 , and is never increased during the inner loop, we 
conclude that there exists ko such that 'pk <1 for all k > ko, and also Rk = Rko > 0 for 
all k > /cq. 

As z’^, y^ G B{x, Rko), we can extract an inhnite subsequence fc G /C such that z^ —)■ ; 7 , 
y^ ^ y, k ^ K.. Since we are drawing cutting planes at we have x) = (piz’^, x) = 

mk{z^,x), and then (pkiz^ix) —)■ (p{z,x). Therefore the numerator and denominator in 
the quotient 'pk both converge to (j){x,x) — (f){z,x), fc G /C. Since < 7 < 1 for all k, this 
could only mean 0 (a:, x) — (p{z, x) = 0 . 

Now by condition ^ we have 

(p{x,x) - (pk{y’',x) < {(p{x,x) - (j)k{z^,x)) 0, 

hence limsup^jg^^ 0(a:, x) — (pkiy^^x) < 0. On the other hand, (f)k{y^,x) < (j){x,x) since y^ 
solves the tangent program, hence (pkiy^^x) —>■ (j){x,x), too. 

By the necessary optimality condition for the tangent program (|^ there exist gk G 
di<pk{y^,x) and a normal vector Vk to C H Bi^x^Rko) at y^ such that t) = gk + Vk- By 
boundedness of the y^ and local boundedness of the subdifferential, the sequence gk is 
bounded, and hence so is the sequence Vk- Passing to yet another subsequence k E IC' C IC, 
we may assume gk -E g, Vk ^ v, and by upper semi-continuity of the subdifferential, 
g G dicpiy, x), and v is in the normal cone to Cf\B{x, Rk^f) at y. Since 0 = g+v, we deduce 
that p is a critical point of the optimization program min{0(|/,a:) : y E C (1 B{x,RkQ)}, 
and since this is a convex program, p is a minimum. But from the previous argument we 
have seen that (j){y,x) = (p{x,x), and since x is admissible for that program, it is also a 
minimum. A simple convexity argument now shows that x is a minimum of ([^. □ 

Lemma 3. Suppose the inner loop at x with trial point z^ and solution y^ of the tangent 
program at inner loop counter k turns forever, and liminffc^oo Rk = t). Then x is a critical 
point of ([^ . 

Proof. This proof uses (|^ obtained in Lemma We are in the case where Pk Z ^ for 
inhnitely many k E M. Since Rk is never increased in the inner loop, we have Rk -E 0. 
Hence y^, x as /c —)■ 00 . 

We claim that (f>k{,z^ix) -E /(x). Indeed, we clearly have limsup^^g^ 0 ^( 2 ;^,x) < 
limsup;._^o^ x) = limfc^oo t) = /(x). On the other hand, the exactness plane 
mo(-,x) = /(x) + Po {■ — x) is an affine minorant of 0 fc(-,x) at all times k, hence /(x) = 
limk^oo , x) < liminffc^oo x), and the two together show (f)k{z^,x) -E /(x). 
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By condition (|^ we have x) > ||a;—^^||, where gk ^ d (0fc(-, x) + ic) {y^) 

is the aggregate subgradient. Now assume that > r; > 0 for all k. Then /(x) — 
(j)k{z’‘,x) > ar]\\x — z^\\. 

Since z^ —)■ x, using axiom (M 2 ) there exist —?• 0+ such that f{z^) — 0(z^,x) < 

efc||x — z^\\. But then 


Pk 


f{z^) - (p{z^,x) 
f{x) - (j)k{z’^,x) 


< Pfc + 


efc||x — 

ari\\x — z^ 


Pk + efc/(crh)- 


Since —?• 0, < 7 , we have limsup^j^go Pk ^ 1 <1-, contradicting the fact that pfc > 7 

for inhnitely many k. Hence \\gk\\ > 7 > 0 was impossible. 

Select k ^ fC such that g^ —t 0. Write gk = Pk + Qk with G di(j)k{y^,x) and 
Qk G Nc{y^). Using the boundedness of the y^ extract another subsequence k E K.' such 
that Pk p, qk ^ q- Since y^ —>■ x, we have q G Nc{x). We argue that p G df{x). 
Indeed, for any test vector h the subgradient inequality gives 


Plh< (j)k{y^ + h,x)- (t)k{y^,x) < (l){y^ + h,x) - (l)k{y^,x). 

Since (f)k{y^,x) —)■ /(x) = 0(x, x), passing to the limit gives 

p~^h < (p{x + h,x) — 0(x,x), 

proving p G di(l){x,x) C df{x). This proves that x is a critical point of ([^. □ 


3.2. Convergence of the outer loop. In this section we prove our main convergence 
result. 


Theorem 1. Suppose f has a strict first-order model 0. Let x^ E C be such that {x G C : 
f{x) < /(x^)} is bounded. Let x^ E C be the sequence of iterates generated by Algorithm 
Then every accumulation point x* of the x^ is a critical point of ([^ . 

Proof. 1) Without loss we consider the case where the algorithm generates an inhnite 
sequence x^ E C of serious iterates. Suppose that at outer loop counter j the inner 
loop hnds a successful trial step at inner loop counter kj, that is, zU = where the 
corresponding solution of the tangent program is x^~^^ = y^L Then pk- >7, which means 

(6) /(x^) - /(x^+^) > 7 (/(x^) - 0fc^(x^+\x^)) . 

Moreover, by condition ([^ we have ||xMi — x^\\ < M||x-^’''^ — x^\\ and 

(7) /(x^) - 0fc^(x^+\x^) > 9 (/(x^) - 0fc^.(F+\x^)) , 
and combining ([^ and ([^ gives 

(8) /(x^) - f{x^^^) > 70 (/(x-^) - 0fc^.(F+\x^)) . 

Since pU = xi+^ is a solution of the kjth. tangent program ([^ of the jth inner loop, there 
exist gj E d {((>kj{-, x^) + ic) (x-^’*'^) and a unit normal vector Vj to the ball B{x^, Rk^) at 
such that 

93 + hjWvj = 0 . 

We shall now analyze two types of inhnite subsequences, those where the trust-region 
constraint is active at x^i^ and those where it is inactive. 

2) Let us start with the simpler case of an inhnite subsequence x-^, j E J, where 
||xt — x-^'+i|| < Rkp i.e., where the trust-region constraint is inactive. There exist pj E 
c?i0fc^(x-^+^,X'’) and qj E Nc{x^~^^) such that 

0 = Pi + 93 - 
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By the subgradient inequality, applied to pj G we have 

-qj{x^ -5^+^) =pj{x^ < (t)k^{x\x^) - 

= f{x^) - {f{x^) - /(x^+^)) , 

using (|^. Since pJ{x^ — x^~^^) = qj{x^^^ — x^) > 0 by Kolmogoroff’s inequality, we deduce 
summability J2j&jPj ~ < oo, hence pJ{x^ — —)■ 0, j G J, and then also 

qJ {x^ — x^^^) —)■ 0. 

Let h be any test vector, then 

p]h< cl)k. + h,x^)- (l)k^ x^) 

< + h,x^) - f{x^) + f{x^) - (j)kj{x^^^,x^) 

< + h,x-^) — f{x^) + ~ ■ 

Now let h' be another test vector and put h = x^ — x^^^ + h'. Then on substituting this 
expression we obtain 

pJ {x^ — + p]h' < 4>{x^ + h\ x^) — f{x^) + ~ ■ 

Passing to the limit, we have pJ{x^ —x^~^^) —)■ 0 by the above, and f{x^) — f{x^^^) —?■ 0 by 
the construction of the descent method. Moreover, hmsupjgj0(x-^ + h',x-^) < (j){x* + h',x*) 
by axiom (M3) and pj —)■ p for some p. That shows 

p~^h' < (j){x* + h', X*) — f{x*) = (j){x* + h', X*) — 0(x*, X*). 

Since h' was arbitrary and 0(-, x*) is convex, we deduce p G 9i0(x*, x*), hence p G df{x*) 
by axiom (Mi). 

Now observe that x-^+^ —>■ x and qj ^ q & Nc{x). We wish to show that q G Nc{x*). 
Since qJ (x-^ — x-^’*'^) —)■ 0, we have q~^{x* — x) = 0, but g 7 ^ 0 and x* — x 7 ^ 0. Now 
for any element x E C we have g^(x — x) > 0 by Kolmogoroff’s inequality. Hence 
q~''(x* — x) = q~'~(x — x) + q~^{x* — x) = q^{x — x) > 0, so Kolmogoroff’s inequality holds 
also at X*, proving q G Nc{x*). We have shown that 0 = p + g G d {((){■, x*) + ic) {x*), 
hence x* is a critical point of ([^. 

3) Let us now consider the more complicated case of an inhnite subsequence, where 
||xt — a;t+i|| = with Qj 7^ 0. In other words, the trust-region constraint is active at 
xM^. Passing to a subsequence, we may assume x^ -E x*, and we have to show that x* is 
critical. 

Let Uj be the unit vector Uj = (x^i _ x^)/\\x^^^ ~ Then if the norm || ■ || coincides 
with the Euclidian norm | ■ |, we have Uj = Vj. For other norms this is no longer the case, 
but for any such norm there exists a > 0 such that ujvj > a > 0 for all j. Then 

gj{x^ — = —\\x^ — ff^^WgJuj = \\x^ — x^^^\\\\gj\\vjuj > cr||pj||||x-^ — x-^’''^||. 

By the subgradient inequality, and using x-’ , x^i g C, we have 

gj{x^ -F+^) < 0fc^.(x^x^) -0fc.(F+\x^) = /(x^) - 0fc^.(F+\x^). 

Altogether 

(9) /(x^) -0fc^-(F+\x^) > a||pj||||x^ -F+^||. 

Combining this with ([^ gives 

\\9j\\\\x^ -x^+1 < a-^'j-^ 9 -^ (/(x^) -/(x^+^)) . 
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Summing both sides from j = 1 to j = J gives 

.7 

j=i 

Since the values f{x^) are decreasing and {x E C : f (x) < f {x^)} is bounded, the sequence 
x^ must be bounded. We deduce that the right hand side is bounded, hence the series on 
the left converges: 

OO 

(10) ^ Ikill < OO. 

7 = 1 


In particular, this implies ||5 'j||—)■ 0. Using Hx-^ — we 

also have ||5'j||||x-^ — —)■ 0. 

We shall now have to distinguish two subcases. Either there exists a subsequence 
J' C J such that 0 as j G J', or R^. > i?o > 0 for all j G J. The second subcase 

is discussed in 4) below, the first is handled in 5) - 6). 

4) Let us consider the sub-case of an infinite subsequence j E J where \\x^ — = 

Rk, > Ro > 0 for every j E J. Going back to ([l0|), we see that we now must have gj ^ 0, 

as x^ — x^~^^ 74 0. Let us write gj = pj -|- qj, where pj E x^) and Qj E Nc{x^~^^)- 

Then 


pj(x^ - x^+^) < 0 fc^.(x^x^) - (/)fc^.(x^+\x^) < 7 ^ 6 ^ ^ • 


Now gj (x^ —xa+^) = pJ(x^ — xa+^) -1-gJ(x^ —xa+^) < pJ(x^ —xa^^), because Kolmogoroff’s 
inequality for x^^^ G C and qj E Nc{x^^^) gives qJ— x^) >0. Hence we have 


gjix^ 


X- 


■7+1' 


< p7 (x-^ 


X' 


.7+1 


7 


so pJ (x^ — xa+^) —)■ 0, because the lefthand term and the righthand term both converge 
to 0. As a consequence, we also have qJ (x^ — xa+^) —)■ 0. 

Now observe that the sequence x^ G G is also bounded, because {x G G : /(x) < /(x^)} 
is bounded and the x^ form a descent sequence for /. Let us say ||x^ — x^H < K for all j. 
We argue that the pj are then also bounded. This can be shown as follows. Let h be a 
test vector with ||h|| = 1. Then 


p]h< 0fc,- + h, x^) - x^) 

< (f){x^^^ + h, x^) — moj(x'^^^, x-^) 

= (j){x^^^ + h,x^) — f{x^) — gQj{x^~^^ — x^) 

< G + |/(x^)| + WgojlWW - x^^^l 


where G :=max{0(M,u) : ||m— x^|| < ||u—x^|| < K} < 00 and where poj ^ df{x^) 

by the definition of the exactness plane at xL But observe that df is locally bounded by 
|37], so IIpojII < K' < 00 . We deduce ||pj|| <C+ \ f{x^)\ -|- K'(2K + M) < 00 . Hence the 
sequence pj is bounded, and since gj = pj + qj ^ 0 by the above, the sequence qj is also 
bounded. 

Therefore, on passing to a subsequence j E J', we may assume x^ —>■ x*, x^~^^ —)■ x, 
Pj p, qj —?• q. Then q E Nc{x). Now from the subgradient inequality 

p]h< (j)k,{x^^^ + h,x^) - (j)kj{ff^^,x^) 

< -1- h,x^) - /(x^) -t- /(x^) - 

< (j){x^~^^ + h,x^) — 4>{x^,x^) + 'y~^6~^ {f{x^) ~ fix^'^^)) > 
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where we use ([^, < 0, and acceptance p^. > 7, and where the test vector h is 

arbitrary. Let h' another test vector and put h = + h'. Substituting this gives 

(11) pj {x^ — + P~]h' < 4>{x^ + h', x^) — 4>{x\ x^) + ~ ■ 


Now pj{x^ — = {pj + qj)^{x^ — + qj— x^) > {pj + qj)^{x^ — using 

Kolmogoroff’s condition for qj G Nc{x^~^^)- Therefore, on passing to the limit in (11), 
using {pj+qj^' 

(j){x* + h',x*), which follows from axiom (M 3 ), we hnd 


(x-^—2;i+i) 0, y(a;t)—/(x-^'+i) —0, Pj —)■ p and limsup^gj, 0(x-^+h', x^) < 


p~^h' < (j){x* + h\x*) — (j){x*,x*). 


Since h' was arbitrary, we deduce p G 9i0(a:*, a:*), and by axiom (Mi), p G df{x*). 

It remains to show q G Nc{x*). Now recall that qJ{x^ — x^i) —)• 0 was shown at the 
beginning of part 4), so q~^ {x* — x) =0. Given any test element x E C, Kolmogoroff’s 
inequality for g G Nc{x) gives q~^{x—x) > 0. But then g'''(x*—a:) = q~^{x—x)+q~^{x*—x) = 
q~^{x — x) > 0, so Kolmogoroff’s inequality also holds for q at x*, proving q G Nc{x*). 

With q G Nc{x*) and g = p + g = 0, we have shown that x* is a critical point of 
That settles the case where the trust-region radius is active and bounded away from 0. 

5) It remains to discuss the most complicated sub-case of an inhnite subsequence j G J, 
where the trust-region constraint is active and Rkj —)■ 0. This needs two sub-sub-cases. 
The hrst of these is a sequence j E J where in each jth outer loop the trust-region radius 
was reduced at least once. The second sub-sub-case are inhnite subsequences where the 
trust-region radius stayed frozen {R^j = R^.) throughout the jth inner loop for every 
j E J. This is discussed in 6 ) below. 

Let us hrst consider the case of an inhnite sequence j E J where Rk^ is active at x-^’*'^, 
and Rkj -E 0, j E J, such that during the jth inner loop the trust-region radius was 
reduced at least once. Suppose this happened the last time before acceptance at inner 
loop counter kj — Uj. Then for j G J, 

Rkj = Rkj-l = ■ ■ ■ = Rkj-Vj = 2^kj-Uj-l- 

By step 7 of the algorithm, that implies 


Pkj—Uj ^ Ti Pkj—Uj ^ b- 

Now ||x-^+^ — x-^ll < Rkj and — x-^|| < Rk^-u^-i = ‘^Rkp hence x-^+^ — Q, 

xi — —)■ 0, j G J”. From axiom (M 2 ) we deduce that there exists a sequence ej —)■ O’*" 

such that 

By the dehnition of the aggregate subgradient 'pj E 9 (•, x-^)-|-ic) and 

Lemmawe have /(x-^) — > (T||gj||||x-^ — z’^^~''^\\. 

Recall that x^ -E x* and that we have to show that x* is critical. It suffices to show that 
there is a subsequence j E J' with pj —)■ 0. Assume on the contrary that \\Pj\\ > 7 > 0 for 
every j E J. Then 

/(x^) - 4>k,-.,{z^^-''\x^) > paWz^^-'^^ - xi. 

Now 

f[z^j-''j) - (t){z^J-''j ^X^) ejWz’^i-'^i - x^W ^ 

Pkj Vj Pkj i/j + — (f)kj-Uj{z^^~^^ ■,X^) ~ — X-^II ^ 

for j E J sufficiently large, contradicting Pkj-uj > 7- This shows that there must exist 
a subsequence J' such that pj —)■ 0, j G J'. Passing to the limit j E J', this shows 
0 G 9 (0(-, X*) -I- ic) (x*), hence x* is critical for ([^. 
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6) Now consider an infinite subseqnence j E J where —)■ x*, the trnst-region radius 
Rkj was active at x^~^^ when x^~^^ was accepted, Rk^ —t 0, but during the jth inner loop 
the trust-region radius was never reduced. In the classical case this can only happen when 
x^^^ at j is immediately accepted, but with bundling this could also happen when the 
inner loop adds cutting planes for a time, while the test in step 7 keeps Rk+i = Rk in the 
inner loop. Since Rk^ —)• 0, the work to bring the radius to 0 must be put about somewhere 
else. For every j G J define j' G N to be the largest index j' < j such that in the j'th inner 
loop, the trust-region radius was reduced at least once. Let J' = {f : j G J}, where we 
understand j h-)■ j' as a function. Passing to a subsequence of J, J', we may assume that 
x^ —)• x' and gj' —)■ 0, because the sequence J' corresponds to one of the cases discussed 
in parts 2) - 5). Passing to jet another subsequence, we may arrange that the sequences 
J, J' are interlaced. That is, j' < j < j'^ < < j~^~^ <■■■—)■ oo. This is because 

j' tends to cxD as a function of j. 

Now assume that there exists g > 0 such that \\gj\\ > g for all j G J. Then since 
x^ —>■ X*, we also have x^~^^ —>■ x*. Fix e > 0 with e < g. For j ^ J large enough we have 
\\gj'\\ < e, because gj> —)■ 0, j' G J', and as j gets larger, so does j'. That means in the 
interval there exists an index j" G N such that 

il5'i"ll < IlS'ill > e for ah i = j" + 1 ,..., j. 


The index j" may coincide with j', it might also be larger, but it precedes j. In any case, 
j H-)■ j" is again a function on J and defines another infinite index set J" still interlaced 
with J. 

Now recall from part 3), estimate (10), and ||x-^ —xl’''^|| < M||xl —x-^+^||, that for some 
constant c > 0 
j 


\\9i\ 

i=j"+l 


X — X 


,i+l| 


< c /(x^ + ) - /(x^+ ) W 0 (j G J, j ^ CX), j ^ j"). 


Since by construction ||5fj|| > e for all i G [f -|- l,...,j], and that for all j G J, the 
sequence X]i=j"+i 11^* ~ x*+^|| —)■ 0 converges as j G J, j —)■ cxd, and by the triangle 
inequality, x^”^^ — x^^^ —)■ 0. Therefore x^"^^ —)■ x*. Since gy/ G d{f ^-ic){x^"^^)^ passing 
to yet another subsequence and using upper semi-continuity of the subdifferential, we 
get gj" g E d{f + ic){x*). Since \\gj"\\ < e, we have ||^|| < e. It follows that 
d{f + ic){x*) contains an element of norm < e. As e < g was arbitrary, we conclude that 
0 G d{f + ic){x*). That settles the remaining case. □ 


4. Stopping test 


A closer look at the convergence proof indicates stopping criteria for algorithm As 
is standard in bundle methods, step 2 is not executed as such but delegated to the inner 
loop. When a serious step x^~^^ is accepted, we apply the tests 


X-' 


x- 


•i+ii 


< toll. 


/(xl) - /(x^+^) 


in tandem with 


l + ||x^j| l + 

min{||c/j||, \\gj>\\} ^ 


< top 


1 + |/(xJ)| 

Here gj is the aggregate subgradient at acceptance. In the case treated in part 6) of 
the proof we had to consider the largest index j' < j, where the trust-region radius was 
reduced for the last time. If in the inner loop at x^ leading to x^~^^ the trust-region radius 
was not reduced, we have to consider both aggregates, otherwise ||5fj||/(l -|- ||xl||) < top 
suffices. If the three criteria are satisfied, then we return x^~^^ as optimal. 
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On the other hand, when the inner loop has difficulties finding a new serious iterate, 
and if a maximum number /Cmax is exceeded, or if for z/max consecutive steps 


\x-^ 


l + ||x^'|| 


in tandem with 


< toll. 


\\9k 


f{x^) - f{z^) 

1 + |/(x^')| 

< tola 


< toh 


1 + \f{x^)\ 

are satisfied, where Qk is the aggregate subgradient at y^, then the inner loop is stopped 
and x^ is returned as optimal. In our tests we use /Cmax = 50, Umax = 5, tofi = tol 2 = 10“®, 
tola = 10 “®. Typical values in algorithm [I] are 7 = 0 . 0001 , 7 = 0 . 0002 , F = 0 . 1 . 


5. Applications 


In this section we highlight the potential of the model-based trust-region approach by 
presenting several applications. 

5.1. Full model versus working model. Our convergence theory covers the specific 
case (pk = (j)-i which we call the full model case. Here the algorithm simplifies, because 
cutting planes are redundant, so that step 6 becomes obsolete. Moreover, in step 7 the 
quotient pk always equals 1 , so the only action taken is reduction of the trust-region 
radius. This is now close to the rationale of the classical trust-region method. 


5.2. Natural model. For a composite function f = g o F with g convex and F of class 
the natural model is x) = g {F{x) + F\x){y — x)), because it is strict and can be 
used in algorithmic In the full model case (fk = 4>i om' algorithm reduces to the algorithm 
of Ruszczyhski |38l Chap. 7.5] for composite nonsmooth functions. 


5.3. Spectral model. An important field of applications, where the natural model often 
comes into action, are eigenvalue optimization problems 

minimize Ai (J^(a;)) 

^ ' subject to X ^ C 

where : M” —>■ S'” is a class C^-mapping into the space ofmxm symmetric or Hermitian 
matrices S’”, and Ai(-) the maximum eigenvalue function on S’”, which is convex but 
nonsmooth. Here the natural model is (j){y,x) = Ai {J^{x) + J^'{x){y — x)), where J^' is 
the differential of T. Note that nonlinear semidefinite programs 

minimize f{x) 

(13) subject to J^{x) ^0 

xeC 


are special cases of (12) if we use exact penalization and write (13) in the form 


minimize /(x)-f cmax { 0 , Ai (J^(x))} 
subject to X & C 


with a suitable c > 0. Namely, this new objective may be written as the maximum 
eigenvalue of the mapping 


P{x) = 


f{x) 0 

0 f{x)Im + cJ^{x) 


e S^+’”. 


Let us apply the bundling idea to (12) using the natural model (j). Here we may build 
working models cfk generated by infinite sets Qk of cuts (a, g) from 0 , and still arrive 
at a computable tangent program. Indeed, suppose y^ is a null step at serious iterate x. 
According to step 6 of algorithmic we have to generate one or several cutting planes at y^. 
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This means we have to compute gk € 9Ai + F\x){- — x)) {y^). Now by the general¬ 

ized chain rule the subdifferential of the composite function y ^ \i {J^{x) + J^'{x){y — x)) 
at y is T'^xydXi (-T(x) -|- J^'{x){y — x)), where d\i is now the convex subdifferential of 
Ai in matrix space S™, i.e., 

dXi{X) = {GeS^ :Gh 0,tr(G') = = Ai(X)} 

with X •¥ = tr(XF) the scalar product in S™. Here X'{x)* : —>■ M” is the adjoint of 
the linear operator X'{x). It follows that every subgradient g of the composite function 
is of the form 


(14) 


g = r{xyG, G e dXi {X{x) + r{x){y - x)) 


The corresponding a is a = Ai (X(x) + X’{x){y — x))+g^[x—y). As soon as the maximum 
eigenvalue Ai(X) has multiplicity > 1, the set 9Ai(X) is not singleton, and we may 
therefore add the entire subdifferential to the new set Qk+i- 

Let y^ be a null step, and let Qr be an m x tk matrix whose tk columns form an 
orthogonal basis of the maximum eigenspace of -T(x) -|- X'{x){y^ — x). Let Yk be a 
tk X tfc-matrix with Yk = Tj?", Yk Y 0, tr(Yfc) = 1, then subgradients (14) are of the form 


Gk = QkXkQk- Therefore all pairs {ar,gr{Yr)) G Qk are of the form 

ar = Ai {X{x) + r{x){y^ - x )), gr{Yr) = G, = Q,1;Q 


T 
r ’ 


indexed by ^ 0, tr(yj.) = 1, ^ S*’’ stemming from older null steps r = 1,..., k. The 

trust-region tangent program is then 

minimize max -|- Ai (QrX \x){y - yyOl) 

r=\,...,k ^ ' 

subject to y & G, \\y — x|| < R 


This is a linear semidefinite program if a polyhedral or a conical norm is used, and if G 
is a convex semidefinite constraint set. 

We can go one step further and consider semi-infinite maximum eigenvalue problems 
as in |7], as this has scope for applications in automatic control. It allows us for instance 
to optimize the iLoo-norm, or more general IQC-constrained programs, see [6]. 


5.4. Standard model. The most straightforward choice of a model is the standard model 

y\y,x) = fix) + f%x,y - x), 

as it gives a direct substitute for the first-order Taylor expansion of / at x. Here the full 
model tangent program (|^ has the specific form 

minimize /(x) -)- /°(x, y — x) 

(15) subject to y E G 

\\y - x\\ < Rk 

and if a polyhedral working model is used to approximate (j)'^ via bundling, then we 
get an even simpler tangent program of the form 

minimize /(x) -|- max gj {y — x) 

2=1,...,fc 

(16) subject to y & G 

\\y - x\\ < Rk 


where gi G df{x). If a polyhedral norm is used and C is a polyhedron, then (16) is just 
a linear program, which makes this line attractive computationally. 
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Remark 11. Consider the unconstrained case C = M"" with 0^ = 0**, then y^ = x — 
Rkg{x)/\\g{x)\\, where g{x) = argmin {|| 5 f|| : g G df{x)}, and this is the nonsmooth 

gedfix) 

steepest descent step of length at x. In classical trust-region algorithms the steepest 
descent step of length R^ is often chosen as the hrst-order Cauchy step. 


This raises the following natural question. Can we use the solution of of (15), or 


(16), as a nonsmooth Cauchy point? Since we do not want to keep the reader on the 
tenterhooks too long, here is the answer; no we can’t. Namely, in order to be allowed to 
use the standard model in Algorithm and the solution of (15), (16) as a Cauchy point 
for other models, 0** has to be strict, because this is required in Theorem A sufficient 
condition for strictness of 0** is given in |32]. We need the following 


Definition 4 (Spingarn j33], Rockafellar-Wets |3Z])- A locally Lipschitz function / : 
M” —)■ M is lower-C^ at Xq G M"" if there exist a compact space K, a neighborhood U of 
Xo, and a mapping F : M"" x IK —)■ M such that 

(17) /(x) = maxF(x,|/) 

ySK 

for all X G t?, and F and dF/dx are jointly continuous. The function / is said to be 
upper-C^ at xq if —/ is lower-C^ at xq. □ 


Lemma 4. (See [32] j. Suppose f is locally Lipschitz and upper . Then the standard 
model (j)^ of f is strict. □ 

Example 1. The lightning function / : M —)• M in [23] is an example where 0** is strict, 
but / is not upper C^. It is Lipschitz with constant 1 and has df{x) = [—1,1] for every 
X. The standard model of / is strict, because for all x, y there exists p = p(x, y) G [—1,1] 
such that 


f{y) = f{x) + p\y - x| < /(x) sign(|/ - x){y - x) 

< f{x) + /°(x, y - x) = (j)\x,y - x), 

using the fact that sign(j/ — x) G df{x). At the same time / is certainly not upper-C^, 
because it is not semi-smooth in the sense of |28j . 

When using the standard model 0** in Algorithm we expect the trust-region method 
to coincide with its classical antecedent, or at least, to be very similar to it. But we expect 
more! Let be the class of nonsmooth locally Lipschitz functions / which have a strict 
standard model (j)K Suppose a subclass of leads to simplihcations of algorithmic 
which reduce it to its classical alter ego. Then we have a theoretical justihcation to say 
that functions / G =5^', even though nonsmooth, can be optimized as if they were smooth. 

Following Borwein and Moors [9], a function / is called essentially smooth if it is locally 
Lipschitz and strictly differentiable almost everywhere. The lightning function of example 
IC is a pathological case, which is differentiable almost everywhere, but nowhere strictly 
differentiable. In practice we expect nonsmooth functions to be essentially smooth. This is 
for instance the case for semi-smooth functions in the sense of |28( . for arc-wise essentially 
smooth functions, or for pseudo-regular functions in the sense of [9]. 

Proposition 2. Let f be essentially smooth. Let x^ ^ C be such that{x G C : f{x) < 
/(x^)} is bounded. Suppose the standard model (j)^ is used in algorithm^ Let trial points 
& C satisfying ([^ in step 4 are drawn at random and independently according to 
a continuous probability distribution on C. Then with probability one the steps of the 
algorithm are identical with the steps of the classical trust-region algorithm. Moreover, if 
is strict, then every accumulation point of the sequence x^ is critical. 
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Proof. Since there exists a full neighborhood U of such that every G f/ fl C is 
a valid trial point, and since the elements in f/ fl C are with probability 1 points of 
strict differentiability, the entire sequence consists with probability 1 of points of strict 
differentiability. □ 


Note that we should not expect the themselves to be points of differentiability, let 
alone strict differentiability. In fact the y^ will typically lie in a set of measure 0. For 
instance, if C is a polyhedron, then y^ is typically a vertex of C, or a vertex of the 
polyhedron of the linear program (16). 

Proposition applies in particular when / is upper because upper C'^-functions 
are essentially smooth. However, for upper functions we have the following stronger 
result. A similar observation in the context of bundle methods was hrst made in 1151. 


Lemma 5. Suppose f is locally Lipschitz and upper-C^ and the standard model 0** is used 
in algorithm^ Then we can choose the cutting plane mk{-,x) = f{x) + — x) in 

step 6 with G df{x) arbitrarily, because f°{x, z^ — x) — gj^z^ — x) < ek\\z^ — x|| holds 
automatically for some Ck —t O’*" in the inner loop at x, and f°{x\x^^^ — x^) — gj {x^^^ — 
x^) < ej\\x^^^ — x^\\ holds automatically for some Cj —?• O’*" in the outer loop. 

Proof. Daniilidis and Georgiev m Thm. 2] prove that an upper function is super¬ 
monotone at X in the following sense: For every e > 0 there exists h > 0 such that 
idi ~ g 2 )~^{xi — X 2 ) < e||a:i — X2II for all Xi & U and gi G df{xi). Hence for sequences 
x\y^ —)■ X we hnd Cj —)■ 0+ such that {g* — g^Y{x^ — y^)< (-j\\y^ ~x^\\ for all g* G df{y^), 
gj G df{x^). Choosing g* such that f°(x^, y^ — x^) = gY— x^) then gives the result. □ 

As a consequence we have the following 


Theorem 2. Suppose f is upper-C^, x^ G C, and {x G C : f{x) < f{x^)} is bounded. 
Suppose the classical trust-region algorithm is used, that is, the only cutting plane in step 6 
chosen at x is an arbitrarily exactness plane, and in step 7 the trust-region radius is reduced 
whenever a null step occurs. Then every accumulation point of the seguence of serious 
iterates x^ is a critical point of ([^. Moreover, if f satisfies the Kurdyka-Lojasiewicz 
ineguality, then the x^ converge to a single critical point x* of f. 


Proof. By Lemma the proof of Theorem applies regardless how we choose cutting 
planes from (j)K We exploit this by choosing them in the simplest possible way, namely we 
take only one exactness plane and keep it all the time. If / is differentiable at x then our 
only choice is m{-,x) = f {x) -{-V f {xY{■ — x), otherwise we take m{-,x) = f (x)g~^{■ — x) 
with an arbitrary g G df{x). This makes step 6 redundant and reduces step 7 to the usual 
modihcation of the trust-region radius. And this is now just the classical trust-region 
strategy, for which we then have subsequence convergence by Theorem [Tj 

It remains to show that under the Kurdyka-Lojasiewicz inequality the x^ converge even 
to a single limit. This can be based on the technique of PIHIES]. □ 

Remark 12. An axiomatic approach to trust-region methods is Dennis et al. ca. and 
the idea is adopted in m Chap. 11]. The difference with our approach is that 0 in 
[I71[I3] has to be jointly continuous, while we use the weaker axiom (M3), and that their 
/ has to be regular, which precludes the use of the standard model 0**, hence makes it 
impossible to use the Cauchy point. Bundling is not discussed in these approaches. 

On the other hand, the authors of im, n do allow non-convex models, while in our 
approach 0(-, x) is convex because we want to assure a computable tangent program, and 
be able to draw cutting planes. Convexity of 0(-, x) could be relaxed to 0(-, x) being lower 
C^. For that the downshift idea |2Hll3l] would have to be used. 
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5.5. Delamination problem. Contact mechanics is a domain where nonsmooth op¬ 
timization programs arise frequently. When potential energy is minimized under non¬ 
monotone friction laws, then programs with lower-C^ functions arise. On the other 
hand, quasi-static delamination problems lead to minimization of upper-C^ criteria, see 
dniiMiE] for more information. 

5.6. Model for splitting. Suppose we wish to optimize a function f = g + h where g is 
differentiable and h is convex. Then a model 0 for / is x) = g{x) -|- Vg{x)~^{y — x) -|- 
h{y) = (j)^g{y,x) + h{y). Indeed, for the differentiable g the hrst-order Taylor expansion 
is natural, and the convex h is its own strict model. Cutting planes are now sums of 
cutting planes of the two model components. Algorithm [T] based on 0 could then be an 
alternative to a splitting technique, in particular, as ours carries over easily to the case 
when h is lower-C^. 


5.7. Failure of the Cauchy point. We will show by way of an example that the 
classical trust-region approach based on the Cauchy point fails in the nonsmooth case. 
We operate algorithm with the full standard model 0**, compute the Cauchy point y^ 
via (15) based on the Euclidian norm, and use = y^ as the trial step. This corresponds 
essentially to a classical hrst-order trust-region method. 

The following example adapted from |23| can be used to show the difficulties with this 
classical scheme. We dehne a convex piecewise affine function / : —)■ M as 


/(x) = max{/o(x), /±i(x), f± 2 {x)} 

where x = (xi,X2) and 

/o(x) = -100,/±i(x) = ±2xi + 3x2,f±2{x) = ±5 xi + 2 x 2 . 

The plot below shows that part of the level curve [/ = a] which lies in the upper half 
plane X2 > 0. It consists of the polygon connecting the hve points (—1,0), (—ip, ff), 
(0, |), (ip, ff), (f) 0). We are interested in that part of the lower level set [f < a], which 
lies within the gray-shaded dragon-shaped area inside the polygon [/ < a], and above the 
xi-axis. 



Consider the exceptional set N = = fj = f}, whose intersection with the upper 

half-plane X2 > 0 consists of the three lines xi = 0, X2 = ±3xi. Then for x ^ N the 
gradient V/(x) is unique. We will generate a sequence x^ of iterates which never meets 
N, so that (f>^{y, x) = f (x) + 'V f {x^{y — x) with V/(x) G {±(2,3), ±(5, 2)} at all iterates 






18 


P. Apkarian, D. Noll, L. Ravanbod 


xK It will turn out that serious iterates never leave the dragon area, only trial points 
may. 

Assume that our current iterate x has f{x) = a and is situated on the right upper part 
of the a-dragon, shown as the blue x in the hgure. That means 

X = (xi,-|a;i + |), /(x)=a, 0 < Xi < ^. 

Then x) = fi+{y) = 2yi + 81 / 2 . If the current trust-region radius is R = A/l3r, then 
the solution of Q is 1/ = x -f r(—2, —3) = (xi — 2r, — |xi + | — 3r). If we follow the 
point y as a function of r along the steepest descent line shown in blue, we will reach the 
points A, B in increasing order at 0 < < r^. Here A is the intersection of the steepest 

descent line with the X 2 axis, reached at = Xi/2. The point B is when the ray meets 
the boundary of the a-dragon, which is the line X 2 = —3xi on the left, reached at 

T 

We have f{A) = fi+{A) = a — ^Xi and f{B) = fi-{B) = —^Xi -t- y®; from here 
on / increases along the ray. The test quotient p for trial points y of this form behaves 
as follows 

f{Xa) - f{y) ^ I 

f(Xa)-<p>‘(y,Xa) I a-?^r+19xi 
f 39r 

The quotient is therefore constant on [0,r^], and decreasing on [r^,cx)). If we trace the 

quotient at the point B as a function of Xi, we see that p = ^ at Xi = 0, and P = ^ 

at Xi = ^. That means if we take the Armijo constant as 7 G (|||, 1), then none of the 
points in [5, 00 ) is accepted, whatever xi G (0, :^]. Let the value r where the quotient p 
equals 7 be called r^. Then < r^, and we have 

Let us for simplicity put B = 1. That means good steps where the trust-region radius 
is doubled are exactly those in (x,A], that is, 0 < r < ta- Such a step is immediately 
accepted, and we stay on the right upper half of the a’''-dragon, where < a, except for 
the point A, which we will exclude later. We hnd for 0 < r < = xi/2: 

= a — 13r >0, x'^ = (xi — 2r, —|xi -|- | — 3r) = (xj*", —‘^x^ + ^). 

Note that a = for the limiting case Xi = 0, and for the limiting case Xi = 

According to step 8 of the algorithm the trust-region radius is doubled = 2R) for 
0 < r < ta, because p = 1 > T = 1. 

The second case is when from the current x with /(x) = a a step with R = \/l3r and 
r G {rA,T^) is taken. Then we end up on the left hand side of the dragon with the new 
situation 

x~^ = (xi — 2r, —|xi + I — 3r), /(x+) = /i_(x+) = —4xi -|- a — 5r = a+. 

By symmetry, this case is analogous to the initial situation, the model at x"*" now being 
/i_. We are now on the upper left side of the smaller a^-dragon. Since 7 < p < T, the 
trust-region radius remains unchanged. 

The third case is when r G [r.^, 00 ). Here the step is rejected, and the trust-region 
radius is halved, until a value r < is reached. 

Since 0** is used, no cutting planes are taken, and we follow the classical trust-region 
method. In consequence, the serious iterates x, x+, X+’'',... stay in the dragons a, a'^, a'^^, . 
and converge to the origin, which is not a critical point of /. Note that we have to assure 
that none of the trial points y lies precisely on the X 2 -axis. Now it is clear that for a given 
starting point x the method has a countable number of possible trial steps and we can 
choose the initial xi G (0, such that the X 2 -axis is avoided, for instance, by taking an 


if 0 < r < 
if < r < 

if < T < cx) 
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irrational initial value. Alternatively, in the case where hits the a; 2 -axis, we might use 
rule ^ to change it slightly to a which is not on the axis. In both cases the method 
will never leave the dragon area, hence convergence based on the Cauchy point fails. 


6. Parametric robustness 


We consider an LFT plant |1H] with real parametric uncertainties W„(P, hi), where 



( X 

= Ax 

+ 

Bpp 

+ 

Bu,w 

(18) 

PM : < 9 

= CqX 

+ 

DqpP 

+ 

Dqw'W 


h- 

= CzX 

+ 

DzpP 

+ 

L-^zwW 


and X G is the state, w G the vector of exogenous inputs, and z G the 
regulated output. The uncertainty channel is dehned as p = Aq, where the uncertain 
matrix A is without loss assumed to have the block-diagonal form 


(19) 


A diag ,..., 


with 6i,... ,6m representing real uncertain parameters, and r* giving the number of repe¬ 
titions of 6i. We write 6 = {6i,..., 6m) and assume without loss that 5 = 0 represents the 
nominal parameter value. Moreover, we consider 6 G M'” in one-to-one correspondence 
with the matrix A in (19). 


6.1. Worst case Woo-performance over a parameter set. Our hrst problem concerns 
analysis of the performance of a system (18) subject to parametric uncertainty. In order to 
analyze the robustness of (18) we compute the worst-case Woo performance of the channel 
w ^ z over a given uncertain parameter range normalized to A = [—1,1]™. In other 
words, we compute 

(20) h* = max{||T^2(5)||oo : 6 G A}, 

where T^z( 6 ) is the transfer function ; 2 (s) = tFu{P{s), A)w{s), or more explicitly, 

;,(s) = [P22 (s) + P2i(s)Z\(/ - Pn(s)Z\)-'Pi2(s)] n;(s). 

The significance of (20) is that computing a critical parameter value 5* G A which 
degrades the Woo-performance of (18) may be an important domino in assessing the prop¬ 
erties of a controlled system (18). We refer to |3] where this is exploited in parametric 
robust synthesis. 

Solving (20) leads to a program of the form ([^ if we write (20) as minimization of 
h_(5) = —||T«, 2 ( 5 )||oo over the convex A. The specihc form of A strongly suggest the use 
of the maximum norm |5|oo = max{|5i|,..., |5m|} fo dehne trust-regions. Moreover, we 
will use the standard model 0 * of h_(5) = —||T«; 2 ( 5 )||oo, as is justihed by the following 


Lemma 6. Let D = {6 : Tz^{6) is internally stable}. Then h_ : 6 ^ —||P2«)(5)||oo 'is 
upper-C^ on D. 

Proof. It suffices to prove that : 5 i—>■ ||T„,z(5)||oo is lower C^. To prove this, recall that 
the maximum singular value has the variational representation 

ct{G) = sup sup . 

||li||=l ||^;||=1 

Now observe that h->• \z\, being convex, is lower-G^ as a mapping —)■ M, so we may 

write it as 

\z\ = sup P(z, 1) 
zeL 
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for jointly of class and a suitable compact set L. Then 
(21) hj^{5) = sup sup sup sup iP'ja;)n,/) , 

||u||=l llwll^l /GL 


where = {juj : co G M U {c)o}} is homeomorphic with the 1-sphere. This is a represen¬ 
tation of the form (17) for where the compact space is IK := x {u : ||u|| = 1} x {u : 
||u|| = 1} X L, F is F{6,ju, u, v, 1) := F ju)v, l) and y = {ju, u, v,l). □ 


Theorem 3 (Worst-case H^o norm on A). Let 6^ G A 6e the sequence generated by the 
standard trust-region algorithm applied to program (20) based on the standard model of 
h_. Then the 6^ converge to a critical point 6* of {20). 


Proof. By Lemma Algorithm coincides with a classical first-order trust-region algo¬ 
rithm, with convergence in the sense of subsequences. Convergence to a single critical 
point then follows by observing that satisfies a Lojasiewicz inequality. □ 


6.2. Robust stability over a parameter set. In our second problem we wish to check 


whether the uncertain system (18) is robustly stable over the uncertain parameter set 
A = [—1,1]™. This can be tested by maximizing the spectral abscissa over A; 


( 22 ) 


a* = max{a (A((5)) : 6 G A}, 


where A(5) is the closed-loop system matrix 


(23) 


A(S) = A + B^A (/ - D„A)-' C„ 


and where the spectral abscissa of A G is a{A) = max{Re(A) : A eigenvalue of A}. 

The decision is now as follows. As soon as a* > 0, the solution S* of (22) represents a 


destabilizing choice of the parameters, and this may be valuable information in practice, 
see [3]. On the other hand, if the global maximum has value a* < 0, then a certificate for 
robust stability over 5 G A is obtained. 

Global maximization of (22) is known to be NP-hard laanD], so it is interesting to 


use a local optimization method to compute good lower bounds. This can be achieved by 
algorithmic because (22) is clearly of the form (jC if maximization of a is replaced by 
minimization of —a over A. In our experiment additional speed is gained by adapting 
the trust-region norm |5|oo = max{|(5i|,..., 15^1} to the special form A = [—1,1]™ of the 
set C, and the standard model 0** of a_(5) = —a!(A((5)) is used. With these arrangements 
the method converges fast and reliably to a local optimum, which in the majority of cases 
can be certified a posteriori as a global one. 

In order to justify the use of the standard model in Algorithm [C we have to show that 
a_ is upper-G^, or at least that its standard model is strict. Here the situation is more 
delicate than in section 6.1 We start by observing the following 


Lemma 7. Suppose all active eigenvalues of A{d) at S are semi-simple. Then a-{5) = 
—Q((A(5)) is Clarke subdifferentiable in a neighborhood of 5. 

Proof. This follows from m- A very concise proof that semi-simple eigenvalue functions 
are locally Lipschitz could also be found in |27]. □ 


That a±((5) = ±a;(A(5)) may fail to be locally Lipschitz was first observed in |TT]. This 
may lead to difficulties when a+ is minimized. In contrast, in our numerical testing it 
is a_(5) = —a{A{6)) which is minimized, and this behaves consistently like an upper- 
function. Theoretically we expect a_ to have a strict standard model if all active 
eigenvalues of A((5*) are semi-simple. An argument indicating that its standard model is 
at least directionally strict is given in [3l V.C]. See |29] for more information on a±. 
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Theorem 4 (Worst-case spectral abscissa on A). Let 6^ E A. be the sequence generated 
by Algorithm^ for program (22), where the standard model 0** of is used. Suppose 
every accumulation point S* of the sequence is simple. Then the sequence 6^ converges 
to a critical point of {22). 

Proof. We apply Theorem to get convergence in the sense of subsequences. □ 


6.3. Distance to instability. Our third problem is related to the above and concerns 
computation of the structured distance to instability of (18). Suppose the matrix A in 


(18) is nominally stable, i.e., ^(5) is stable at the nominal <5 = 0. Then the structured 
distance to instability is dehned as 

(24) d* = max{(i > 0 : ^4(5) stable for all |(5|oo < d}, 

where A(5) is given by (23), and |(5|oo = max{|5i],..., |5m|}- Equivalently, we may con¬ 


sider the following constrained optimization program 


minimize t 

(25) subiect to —t<5i<t 

a (21(5)) > 0 

with decision variable x = (t, 5) G Introducing the convex set C = {(t, 5) : —t < 

5j < = 1 ,... ,m}, this can be transformed to program 0 it we minimize an exact 

penalty objective f{x) = t + cmax {0, —a (A(5))} with a penalty constant c > 0 over C. 

It is clear that the objective of / has essentially the same properties as a_. It suffices to 
argue that 5max{0, —a(A(5))} = co{0}Uc?a_(5) at points 5 where a_ is locally Lipschitz 
and a_(5) = 0. Indeed, the inclusion C holds in general. For the reverse inclusion it 
suffices to observe that 0 G c?max{0, —q;( 21(5))} for those 5 where a_(5) = 0. This is 
clear, because 0 is a minorant of this max function. We may then use the following 


Lemma 8. Suppose f = max{/i, /2} and fi has a strict model 0*. Then cj) = max{0i, 02} 
is a strict model of f at those x where df{x) = co {dfi{x) U df 2 {x)). 


Proof. In fact, the only axiom which does not follow immediately is (Mi). We only know 
di(f)i{x,x) C dfi{x), so di(j){x,x) = co (cli0i(a:, a:) U c?i02(a^, a:)) C co {dfi{x) U df 2 {x)). 
For those x where the maximum rule is exact, this implies indeed di(j){x,x) C df{x). □ 


This means that we can use the model 4>{6', t', 6, t) = T-|-cmax{0, 5)} in Algorithm 

gto solve (|^, naturally with the same proviso as in section where we need the 

standard model 0** of a_ to be strict. 


7. Experiments 


In this part experiments with algorithm applied to programs (20), (22) and (24) 
reported. 


are 


7.1. Worst-case if oo-norm. We apply algorithm to program (20). Table shows the 
result for 27 benchmark systems, where n is the number of states, and column 4 gives the 
uncertain structure [ri... r^] according to (19). An expression like 1^3^!^ corresponds 
to [rir2r3r4r5] = [11131]. The values achieved by algorithm are h* in column 6, 
obtained in t* seconds CPU. To certify h* we use the function WCGAIN of |49| . which is 


a branch-and-bound method tailored to program (20). WCGAIN computes a lower and an 
upper bound h, h shown in columns 5,7 within seconds. It also provides a 5 G A 
realizing the lower bound. 

The results in table [T] show that h* is certihed by WCGAIN in the majority of cases 1- 
5,7-9,11-13,16,17. Case 15 leaves a doubt, while cases 6,10,14,24 are failures of WCGAIN. 
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Benchmark 

n 

Structure 

h 

h* 

h 

t* 

h/h* 

twc/U 

1 

Beaml 

11 

1^3M^ 

1.70 

1.71 

1.70 

1.02 

0.99 

13.29 

2 

Beam2 

11 

1^3M^ 

1.29 

1.29 

1.29 

0.36 

1 

32.68 

3 

DC motor 1 

7 


0.72 

0.72 

0.72 

0.51 

1.01 

14.49 

4 

DC motor 2 

7 

1^2^ 

0.50 

0.50 

0.50 

0.13 

1 

45.02 

5 

DVD driver 1 

10 

1^3M^3^ 

45.45 

45.45 

45.46 

0.23 

1 

189.31 

6 

Four-disk system 1 

16 


3.50 

4.56 

3.50 

0.44 

0.77 

343.35 

7 

Four-disk system 2 

16 

1^3M^ 

0.69 

0.68 

0.69 

0.34 

1.01 

558.03 

8 

Four-tank system 1 

12 


5.60 

5.60 

5.60 

0.32 

1 

5.72 

9 

Four-tank system 2 

12 

U 

5.60 

5.57 

5.60 

0.29 

1 

7.32 

10 

Hard disk driver 1 

22 

1^2n4 

243.9 

7526.6 

Inf 

0.96 

Inf 

73.10 

11 

Hard disk driver 2 

22 

1^2‘^U 

0.03 

0.03 

0.03 

0.20 

1.12 

314.92 

12 

Hydraulic servo 1 

9 


1.17 

1.17 

1.17 

0.34 

1 

10.94 

13 

Hydraulic servo 2 

9 

U 

0.7 

0.70 

0.7 

0.33 

1.01 

11.69 

14 

Mass-spring 1 

8 

U 

3.71 

6.19 

3.71 

0.31 

0.60 

3.54 

15 

Mass-spring 2 

8 


6.84 

6.84 

7.16 

0.13 

1.05 

7.05 

16 

Missile 1 

35 

1^6^ 

5.12 

5.15 

5.12 

0.46 

0.99 

272.54 

17 

Missile 2 

35 

1^6^ 

1.83 

1.82 

1.83 

0.22 

1 

1183.5 

18 

Filter 1 

8 

U 

4.86 

4.86 

4.86 

0.32 

1 

3.41 

19 

Filter 2 

3 

IT 

2.63 

2.64 

2.63 

0.27 

1 

4.06 

20 

Filter-Kim 1 

3 

IT 

2.95 

2.96 

2.95 

0.24 

1 

3.4 

21 

Filter-Kim 2 

3 

U 

2.79 

2.79 

2.79 

0.07 

1 

12.95 

22 

Satellite 1 

11 

ThTP 

0.16 

0.17 

0.16 

0.33 

1 

86.17 

23 

Satellite 2 

11 

TW 

0.15 

0.15 

0.15 

0.70 

1 

41.09 

24 

Mass-spring-damper 1 

13 

IT 

7.63 

8.85 

7.63 

0.21 

0.86 

4.88 

25 

Mass-spring-damper 2 

13 

U 

1.65 

1.65 

1.65 

0.08 

1 

13.70 

26 

Robust Toy 1 

3 

1T2^ 

0.12 

0.12 

0.12 

0.56 

1 

4.24 

27 

Robust Toy 2 

3 

1^2^3^ 

20.85 

21.70 

20.91 

0.24 

0.96 

29.19 


Table 1. Benchmarks for worst-case Hoo-noim on A 


On average algorithm was 121-times faster than WCGAIN. The fact that both methods 
are in good agreement can be nnderstood as an endorsement of onr approach. 

7.2. Robust stability over A. In onr second test algorithm is applied to program 
(j2^. We have nsed a bench of 32 cases gathered in Table and algorithm!^ converges 
to the valne a* in t* seconds. To certify a* we have implemented algorithm known as 
integral global optimization, or as the Zheng-method (ZM), based on |47|. Here fi is any 


Algorithm 2. Zheng-method for global optimization a* = max^-g^ f{x) 

> Step 1 (Initialize). Choose initial a < a*. 

1 dnix) 

> Step 2 (Iterate). Compnte ^-. 

m 

> Step 3 (Stopping). If progress of a~^ over a is marginal, stop, otherwise npdate a 
by a~^ and loop on with step 2. 


continnons hnite Borel measnre on A. Nnmerical implementations nse Monte-Carlo to 
compnte the integral, and we refer to m for details. Our numerical tests are performed 
with 2000 ■ m initial samples, and stopping criterion variance = 10“^; cf. m for details. 
The result obtained by ZM are a^M obtained in tiM seconds CPU. 
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Benchmark 

n 

Structure 

a* 

azM 

t* 

tzM 

28 

Beam3 

11 

1^3M^ 

-1.2e-7 

-1.2e-7 

0.19 

32.70 

29 

Beam4 

11 

1^3M^ 

-1.7e-7 

-1.7e-7 

0.04 

33.00 

30 

Dashpot system 1 

17 

l^’ 

0.0186 

0.0185 

0.23 

90.25 

31 

Dashpot system 2 

17 

l^* 

-l.Oe-6 

-l.Oe-6 

0.39 

39.63 

32 

Dashpot system 3 

17 

l^* 

-1.6e-6 

-1.6e-6 

0.08 

39.70 

33 

DC motor 3 

7 


-0.0010 

-0.0010 

0.02 

20.63 

34 

DC motor 4 

7 


-0.0010 

-0.0010 

0.02 

20.74 

35 

DVD driver 2 

10 


-0.0165 

-0.0165 

0.04 

49.29 

36 

Four disk system 3 

16 

1^3M^ 

0.0089 

0.0088 

0.10 

159.61 

37 

Four disk system 4 

16 

1^3M4 

-7.5e-7 

-7.5e-7 

0.29 

73.86 

38 

Four disk system 5 

16 


-7.5e-7 

-7.5e-7 

0.29 

74.36 

39 

Four tank system 3 

12 

1^ 

-6.0e-6 

-6.0e-6 

0.17 

25.81 

40 

Four tank system 4 

12 

D 

-6.0e-6 

-6.0e-6 

0.02 

26.20 

41 

Hard disk driver 3 

22 

1^2^14 

266.70 

266.70 

0.09 

297.21 

42 

Hard disk driver 4 

22 

1^2414 

-1.6026 

-1.6026 

0.06 

80.40 

43 

Hydraulic servo 3 

9 


-0.3000 

-0.3000 

0.04 

51.41 

44 

Hydraulic servo 4 

9 

ly 

-0.3000 

-0.3000 

0.02 

50.95 

45 

Mass-spring 3 

8 

1^ 

-0.0054 

-0.0054 

0.01 

31.59 

46 

Mass-spring 4 

8 

1^ 

-0.0368 

-0.0370 

0.01 

16.94 

47 

Missile 3 

35 

1^6^ 

22.6302 

22.1682 

0.07 

104.18 

48 

Missile 4 

35 

1^6^ 

-0.5000 

-0.5000 

0.07 

51.78 

49 

Missile 5 

35 

1^6^ 

-0.5000 

-0.5000 

0.07 

52.24 

50 

Filter 3 

8 


-0.0148 

-0.0148 

0.06 

7.05 

51 

Filter 4 

8 

T4 

-0.0148 

-0.0148 

0.02 

6.89 

52 

Filter-Kim 3 

3 

1^ 

-0.2500 

-0.2500 

0.01 

12.83 

53 

Filter-Kim 4 

3 

1^ 

-0.2500 

-0.2500 

0.01 

12.90 

54 

Satellite 3 

11 


3.9e-5 

3.9e-5 

0.02 

44.02 

55 

Satellite 4 

11 

WI4 

-0.0269 

-0.0269 

0.02 

26.02 

56 

Satellite 5 

11 

146M4 

-0.0268 

-0.0268 

0.02 

26.08 

57 

Mass-spring-damper 3 

13 


0.2022 

0.2022 

0.01 

8.30 

58 

Mass-spring-damper 4 

13 


-0.1000 

-0.1000 

0.01 

6.91 

59 

Mass-spring-damper 5 

13 

T4 

-0.1000 

-0.1000 

0.01 

6.94 


Table 2. Benchmarks for worst-case spectral abscissa (22). 


A favorable feature of ZM is that it can be initialized with the lower bound a*, and this 
leads to a signihcant speedup. Altogether ZM and algorithmic are in very good agreement 
on the test bench, which we consider an argument in favor of our approach. 


7.3. Distance to instability. In this last part we apply Algorithm to (24) using the 
test bench of Table |C which can be found in jTH]. The distance computed by Algorithm 
ICis d* in column 2 of Table § We certify d* using ZM |17] and by comparing to the local 
method of |18) . 

To begin with, ZM is used in the following way. For a given d* and a conhdence level 
7 = 0.05 we compute 


a = max{Q!(A((5)) : 5 G (1 — 'y)d*A} 


(26) 

and 

(27) 


a = max{a(A((5)) : 5 G (1 -|- 7)0?*A}. 
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tt 

Benchmark 

n 

Structure 

d* 

dp/d* 

Dzm 

t* 

tzM 

60 

Academic example 

5 


0.79 

1 

V 

0.15 

7.3 

61 

Academic example 

4 

1 ^ 

3.41 

1 

V 

0.13 

23.9 

62 

Academic example 

4 

2 ^ 

0.58 

1 

V 

0.15 

97.4 

63 

Inverted pendulum 

4 

1 ^ 

0.84 

1 

V 

0.22 

24.7 

64 

DC motor 

4 

1 ^ 2 ^ 1 ^ 

1.25 

1 

V 

0.19 

37.7 

65 

Bus steering system 

9 


1.32 

0.99 

V 

0.37 

13.8 

66 

Satellite 

9 

2W^ 

1.01 

0.99 

V 

0.3 

20.2 

67 

Bank-to-turn missile 

6 

D 

0.60 

0.99 

V 

0.17 

167.7 

68 

Aeronautical vehicle 

8 


0.61 

0.99 

V 

0.19 

38.9 

69 

Four-tank system 

10 

D 

6.67 

0.99 

V 

0.27 

24.9 

70 

Re-entry vehicle 

6 

3^2^3^ 

6.20 

1 

V 

0.44 

21.8 

71 

Missile 

14 

D 

7.99 

1 

V 

0.25 

24.9 

72 

Cassini spacecraft 

17 


0.06 

1 

V 

0.13 

25.1 

73 

Mass-spring-damper 

7 

G 

1.17 

1 

V 

0.17 

2536.3 

74 

Spark ignition engine 

4 

1'^ 

1.22 

0.99 

V 

0.41 

42.8 

75 

Hydraulic servo system 

8 

1“ 

1.50 

0.99 

V 

0.41 

62.8 

76 

Academic example 

41 

¥1^ 

1.18 

0.99 

V 

0.57 

36.5 

77 

Drive-by-wire vehicle 

4 


1 

0.99 

V 

0.96 

97.0 

78 

Re-entry vehicle 

7 


1.02 

0.98 

V 

0.42 

132.4 

79 

Space shuttle 

34 

fy 

0.79 

0.99 

V 

0.8 

60.9 

80 

Rigid aircraft 

9 


5.42 

1 

V 

0.54 

252.5 

81 

Fighter aircraft 

10 


0.59 

0.99 

V 

1.31 

171.3 

82 

Flexible aircraft 

46 

■^‘20 

0.22 

0.99 

V 

1.26 

180.3 

83 

Telescope mockup 

70 


0.02 

0.99 

V 

1.37 

274.8 

84 

Hard disk drive 

29 


0.82 

1 

V 

2.87 

202.1 

85 

Launcher 

30 


1.16 

0.99 

V 

4.08 

271.2 

86 

Helicopter 

12 

30^ 

0.08 

0.99 

V 

0.85 

70.7 

87 

Biochemical network 

7 


0.00 

1 

failed 

36.76 

- 


Table 3. Benchmarks for distance to instability (24), available in |5n( . 


If a < 0 and a > 0 then d* is certihed by ZM with that conhdence level 7. This happens 
in all cases except 87, where ZM failed due to the large size. 

We also compared d* to the result dp of the technique [H], which is a sophisticated 
tool tailored to problem (24). Column 6 of table shows perfect agreement on the bench 
from [18]. Given the highly dedicated character of [18], this can be understood as an 
endorsement of our optimization-based approach. 


Conclusion 

We have presented a bundle trust-region method for nonsmooth, nonconvex minimiza¬ 
tion, where cutting planes are tangents to a convex local model (p{-,x) of /, and where 
a trust-region strategy replaces the proximity control mechanism. Global convergence of 
our method was proved under natural hypotheses. 

By way of an example we demonstrated that the standard approach in trust-region 
methods based on the Gauchy point fails for nonsmooth functions. We have identihed 
a particular class of nonsmooth functions, where the Gauchy point argument can 
be salvaged. Functions in even when nonsmooth, can be minimized as if they were 
smooth. The class must therefore be regarded as atypical in a nonsmooth optimization 
program, and indeed, nonsmooth convex functions are not in 
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Algorithm was validated numerically on a test bench of 87 problems in automatic 
control, where the versatility of algorithm with regard to the choice of the norm was 
exploited. We were able to compute good quality lower bounds for three NP-hard opti¬ 
mization problems related to the analysis of parametric robustness in system theory. In 
the majority of cases, posterior application of a global optimization technique allowed us 
to certify these results as globally optimal. 
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